feat: Add scorer that exposes helpers to evaluate agents by lforst · Pull Request #2146 · braintrustdata/braintrust-sdk-javascript

Luca Forstner (lforst) · 2026-06-19T14:56:06Z

import { Eval, agentAssertionScorer } from "braintrust";
import { z } from "zod";

await Eval("agent-behavior", {
  data: () => [],
  task: async (input) => {},
  scores: [
    agentAssertionScorer(({ output, expected, assert }) => [
      // Tool helpers
      assert.calledTool(
        "web_search",
        {
          input: { query: /capital of Estonia/i },
          times: 1,
        },
        "searches for the answer once",
      ),
      assert.calledTool(
        "summarize_source",
        {
          output: { citations: (value) => Array.isArray(value) },
          isError: false,
        },
        "summarizes sources successfully",
      ),
      assert.notCalledTool("send_email", "does not send email"),
      assert.toolOrder(
        ["web_search", "summarize_source"],
        "searches before summarizing",
      ),
      assert.maxToolCalls(3, "keeps tool use bounded"),
      assert.usedNoTools("does not need tools for memorized fact"),

      // Generic assertion helpers
      assert.contains(output, "Tallinn", "answers directly"),
      assert.equals(output.answer, expected.answer, "exact expected answer"),
      assert.notEquals(output.answer, "I don't know", "does not punt"),
      assert.contains(output.answer, /Tallinn/i, "mentions Tallinn"),
      assert.matches(
        output,
        z.object({
          answer: z.string(),
          citations: z.array(z.string()).min(1),
          confidence: z.number().min(0).max(1),
        }),
        "returns the expected shape",
      ),
    ]),
  ],
});

scorer output


{
  name: "assertions",
  score: passedAssertions / totalAssertions,
  metadata: {
    assertions: [
      { name: "routes to expected department", passed: true },
      { name: "returns valid route shape", passed: false },
      { name: "called tool classify_ticket", passed: true },
    ],
    failed: [
      "returns valid route shape: expected output to match RouteSchema",
    ],
  },
}

Luca Forstner (lforst) added 3 commits June 19, 2026 16:54

feat: Add scorer that exposes helpers to evaluate agents

0e00064

cs

1e9dc26

fixes

64362eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add scorer that exposes helpers to evaluate agents#2146

feat: Add scorer that exposes helpers to evaluate agents#2146
Luca Forstner (lforst) wants to merge 3 commits into
mainfrom
lforst/eval-run-assertions

Luca Forstner (lforst) commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Luca Forstner (lforst) commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Luca Forstner (lforst) commented Jun 19, 2026 •

edited

Loading